Skip to content

[Storage Index Adapter] Fix index template creation on Serverless#264760

Merged
viduni94 merged 8 commits intoelastic:mainfrom
viduni94:kbn-evals-fix-datasert-upserts
Apr 22, 2026
Merged

[Storage Index Adapter] Fix index template creation on Serverless#264760
viduni94 merged 8 commits intoelastic:mainfrom
viduni94:kbn-evals-fix-datasert-upserts

Conversation

@viduni94
Copy link
Copy Markdown
Contributor

@viduni94 viduni94 commented Apr 21, 2026

Closes #264845

Summary

Fixes index template creation on Serverless for indices kibana-evaluation-datasets, kibana-evaluation-dataset-examples).

PR #263096 added auto_expand_replicas and number_of_shards to index templates in StorageIndexAdapter. Serverless ES rejects these settings on non-system indices with an illegal_argument_exception, while hidden indices (e.g.: used by Streams) are unaffected because Kibana manages them as system indices.

Dataset upsert error for Kibana evaluation runs

image

Error in logs:

Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode

Fix

The changes were introduced in three tiers to detect serverless environments for index template settings:

  • Explicit detection - Introduced a new isServerless option in StorageIndexAdapterOptions. When provided, the adapter skips or includes settings without any extra calls.
  • Proactive - if isServerless is not provided, the adapter calls esClient.info() on the first write and checks version.build_flavor. The result is cached for the adapter's lifetime.
  • Reactive - if both above are unavailable (e.g.: info() fails due to insufficient privileges), the adapter catches the illegal_argument_exception on the first write, retries without settings, and caches the result.

The Evals plugin passes isServerless explicitly because the evals route handler creates StorageIndexAdapter with esClient.asCurrentUser, which is scoped to the caller's API key. This API key may lack the monitor cluster privilege needed for esClient.info(), making tier 2 unreliable. There buildFlavor is passed from the plugin context.

Test Plan

  • Deploy the fix to a serverless project from this PR
  • Create a config file (e.g.: config.testcluster.json) and add the serverless project URL as the dataset target
  • Run evals with node scripts/evals start --suite significant-events --project eis-anthropic-claude-4-6-sonnet --judge eis-google-gemini-3-1-pro --export-profile local --datasets-profile testcluster

With this fix, the dataset upsert works as expected

image image

Checklist

  • Unit or functional tests were updated or added to match the most common scenarios
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

@viduni94 viduni94 requested review from a team as code owners April 21, 2026 15:44
@viduni94 viduni94 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting ci:project-deploy-observability Create an Observability project labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@viduni94 viduni94 requested a review from flash1293 April 21, 2026 15:50
@viduni94 viduni94 requested a review from a team as a code owner April 21, 2026 15:54
Comment thread config/serverless.oblt.yml Outdated
# Disable the embedded Dev Console
console.ui.embeddedEnabled: false

xpack.evals.enabled: true
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to test the changes against the serverless test deployment from this PR as the evals golden cluster is broken.

I will be removing this line before merging this PR.

Comment thread config/serverless.oblt.yml Outdated
Comment thread config/serverless.oblt.yml Outdated
@flash1293
Copy link
Copy Markdown
Contributor

Thanks @viduni94 , this is a very good catch - I'm very surprised our existing tests didn't capture it though... what's going on here? Do you happen to know @rudolf ?

In terms of the fix - we should proactively check whether we are running in serverless mode and decide based on that instead of waiting for the error and reacting to it.

@viduni94
Copy link
Copy Markdown
Contributor Author

Thanks @viduni94 , this is a very good catch - I'm very surprised our existing tests didn't capture it though... what's going on here? Do you happen to know @rudolf ?

In terms of the fix - we should proactively check whether we are running in serverless mode and decide based on that instead of waiting for the error and reacting to it.

Thanks @flash1293
Updated the fix to be proactive.
Let me know what you think.

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown
Contributor

elastic-vault-github-plugin-prod Bot commented Apr 21, 2026

Run Metadata

  • Triggered by: Issue #250
  • Elasticsearch image tag: 9.5.0-SNAPSHOT
  • Kibana image: docker.elastic.co/kibana-ci/kibana-serverless:pr-264760-05c6a8c37647
  • Date: 2026-04-21
  • PR: elastic/kibana#264760 — [Storage Index Adapter] Fix index template creation on Serverless
  • Mode: PR-targeted (journeys generated from PR diff)
  • Journeys executed: 8
  • Passed: 8
  • Errored: 0
  • Findings: 0 bugs, 0 warnings, 0 info

Findings

No findings -- all journeys completed without issues.


Screenshots

Screenshots are available in the workflow artifacts.


Workflow run: https://github.com/elastic/kibana-exploratory-testing/actions/runs/24749151549

@flash1293
Copy link
Copy Markdown
Contributor

THanks @viduni94 , definitely better - we'll need to wait for the core team of course for how to proceed here. I'm still puzzled it worked so far... we have lots of tests running serverless that should catch this exact kind of thing, right?

@viduni94 viduni94 marked this pull request as draft April 21, 2026 17:58
@viduni94
Copy link
Copy Markdown
Contributor Author

/ci

@viduni94 viduni94 self-assigned this Apr 21, 2026
@viduni94
Copy link
Copy Markdown
Contributor Author

Thanks @viduni94 , definitely better - we'll need to wait for the core team of course for how to proceed here. I'm still puzzled it worked so far... we have lots of tests running serverless that should catch this exact kind of thing, right?

@flash1293 I had to update the implementation slightly to pass the build flavour from the evals plugin as esClient.info() doesn't work as expected when executed in the scope of the user's context/permissions.
Let me know what you think. Tested that this works in the test serverless deployment on this PR.

@viduni94 viduni94 marked this pull request as ready for review April 21, 2026 20:41
@viduni94 viduni94 requested review from a team as code owners April 21, 2026 20:41
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Apr 21, 2026

💛 Build succeeded, but was flaky

  • Buildkite Build
  • Commit: 05c6a8c
  • Kibana Serverless Image: docker.elastic.co/kibana-ci/kibana-serverless:pr-264760-05c6a8c37647

Failed CI Steps

Test Failures

  • [job] [logs] affected Scout: [ platform / streams_app-stateful-classic ] plugin / local-stateful-classic - Stream data routing - creating routing rules - should show validation errors for invalid stream names verified on the client
  • [job] [logs] Jest Tests #3 / SelectedFilters should render properly
  • [job] [logs] Jest Tests #10 / ShareToSpaceFlyout with enableCreateCopyCallout shows a warning callout when the saved object only has one namespace

Metrics [docs]

✅ unchanged

History

cc @viduni94

Copy link
Copy Markdown
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config/serverless.oblt.yml still has xpack.evals.enabled: true with a TODO comment to remove before merging. That needs to come out.

The three-tier serverless detection (explicit flag, proactive esClient.info() check, reactive catch-and-retry) is well thought out and the tests cover all three paths plus the caching behavior. The rest looks good to me.

@viduni94
Copy link
Copy Markdown
Contributor Author

viduni94 commented Apr 21, 2026

config/serverless.oblt.yml still has xpack.evals.enabled: true with a TODO comment to remove before merging. That needs to come out.

Thanks for the review @TinaHeiligers
I have already removed xpack.evals.enabled: true in 05c6a8c after testing.

The three-tier serverless detection (explicit flag, proactive esClient.info() check, reactive catch-and-retry) is well thought out and the tests cover all three paths plus the caching behavior. The rest looks good to me.

Thank you 🙏🏻

Copy link
Copy Markdown
Contributor

@SrdjanLL SrdjanLL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - code review only!

I think in the long term we should have evals indices registered as system indices in ES (not a blocker at this early stage though).

I've created an issue to track this #264945 - feel free to add more context as needed.

Copy link
Copy Markdown
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me

@viduni94 viduni94 merged commit 6097295 into elastic:main Apr 22, 2026
18 checks passed
smith pushed a commit to smith/kibana that referenced this pull request Apr 23, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
rbrtj pushed a commit to walterra/kibana that referenced this pull request Apr 27, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
SoniaSanzV pushed a commit to SoniaSanzV/kibana that referenced this pull request Apr 27, 2026
…astic#264760)

Closes elastic#264845

## Summary

Fixes index template creation on Serverless for indices
`kibana-evaluation-datasets`, `kibana-evaluation-dataset-examples`).

PR elastic#263096 added `auto_expand_replicas` and `number_of_shards` to index
templates in `StorageIndexAdapter`. Serverless ES rejects these settings
on non-system indices with an `illegal_argument_exception`, while hidden
indices (e.g.: used by Streams) are unaffected because Kibana manages
them as system indices.

### Dataset upsert error for Kibana evaluation runs

<img width="1247" height="473" alt="image"
src="https://github.com/user-attachments/assets/10e75668-7a1d-462e-9594-37fbee0f08e3"
/>

### Error in logs:
```
Failed to upsert evaluation dataset: ResponseError: illegal_argument_exception
	Root causes:
		illegal_argument_exception: Settings [index.auto_expand_replicas,index.number_of_shards] are not available when running in serverless mode
```

## Fix

The changes were introduced in three tiers to detect serverless
environments for index template settings:

- Explicit detection - Introduced a new `isServerless` option in
`StorageIndexAdapterOptions`. When provided, the adapter skips or
includes settings without any extra calls.
- Proactive - if `isServerless` is not provided, the adapter calls
`esClient.info()` on the first write and checks `version.build_flavor`.
The result is cached for the adapter's lifetime.
- Reactive - if both above are unavailable (e.g.: `info()` fails due to
insufficient privileges), the adapter catches the
`illegal_argument_exception` on the first write, retries without
settings, and caches the result.

The Evals plugin passes `isServerless` explicitly because the evals
route handler creates `StorageIndexAdapter` with
`esClient.asCurrentUser`, which is scoped to the caller's API key. This
API key may lack the monitor cluster privilege needed for
`esClient.info()`, making tier 2 unreliable. There `buildFlavor` is
passed from the plugin context.

## Test Plan

- [x] Deploy the fix to a serverless project from this PR
- [x] Create a config file (e.g.: `config.testcluster.json`) and add the
serverless project URL as the dataset target
- [x] Run evals with `node scripts/evals start --suite
significant-events --project eis-anthropic-claude-4-6-sonnet --judge
eis-google-gemini-3-1-pro --export-profile local --datasets-profile
testcluster`

### With this fix, the dataset upsert works as expected
<img width="1531" height="877" alt="image"
src="https://github.com/user-attachments/assets/84c2a5cd-138b-457e-85d3-bd87bff4867c"
/>

<img width="1710" height="556" alt="image"
src="https://github.com/user-attachments/assets/bbfeb03a-405f-4551-8326-e12b0192d332"
/>

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [x] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting ci:project-deploy-observability Create an Observability project release_note:skip Skip the PR/issue when compiling release notes v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[StorageIndexAdapter] Index template creation fails on Serverless for non-system indices

6 participants